Field
The present disclosure relates to media processing equipment, and more specifically, to systems and methods for generating captions and summarizing media files.
Related Art
With the increased prevalence of recording devices, video and audio may be used to track what has happened, where it has happened and to whom it has happened. For example, surveillance videos and/or life-logging videos may be used to track events occurring at a location or in a person's life.
However, reviewing and navigating such video or audio content from such devices may be a time-consuming process, particularly for long media files videos that are captured for the purpose of tracking without any editing. For example, surveillance videos may have hours of inaction or recurring static action that must be reviewed to find an event that only last a few minutes. Related art systems may allow rapid scanning of the media files, but do not allow for automatic generation of captions and summarization of long, continuous media files.